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(jj | Abstract 



We introduce several modifications of the partitioning schemes used in Hoare's quick- 



sort and quickselect algorithms, including ternary schemes which identify keys less 
or greater than the pivot. We give estimates for the numbers of swaps made by 
each scheme. Our computational experiments indicate that ternary schemes allow 
g : qnickselec. to identify all Keys eqnal to the .elected Key at little additional cost. 
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> 1 Introduction 

Hoare's quicksort |Hoa62j and quickselect (originally called Find) |Hoa61j are among the 
most widely used algorithms for sorting and selection. In our context, given an array 
x[l:n] of n elements and a total order <, sorting means permuting the elements so that 
x% < Xi + i for i — l:n — 1, whereas for the simpler problem of selecting the kth smallest 
element, the elements are permuted so that Xi < x^ < x~ for l<i<k<j<n. 

Both algorithms choose a pivot element, say v, and partition the input into a left array 
x[l: a — 1] < v, a middle array x[a: b] = v, and a right array x[b+ 1: n] > v. Then quicksort 
^ ■ is called recursively on the left and right arrays, whereas quickselect is called on the left 
array if k < a, or the right array if k > b; if a < k < b, selection is finished. 

This paper introduces useful modifications of several partitioning schemes. First, we 
show that after exchanging x\ with x n when necessary, the classic scheme of Sedgewick 
Knu98t §5.2.2] no longer needs an artificial sentinel. Second, it turns out that a simple 



modification of another popular scheme of Sedgewick [BcM93 , Prog. 3] allows it to handle 
equal keys more efficiently; both schemes take n or n + 1 comparisons. Third, we describe 
a scheme which makes just the n — 1 necessary comparisons, as well as the minimum 
number of swaps when the elements are distinct. This should be contrasted with Lomuto's 
scheme [BeM93, Prog. 2], CLRS01, §7.1], which takes n - 1 comparisons but up to n — 1 
swaps. Hence we analyze the average numbers of swaps made by the four schemes when the 
elements are distinct and in random order. The first three schemes take at most n/4 swaps 
on average, whereas Lomuto's scheme takes up to n — 1. Further, for the pivot selected 
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as the median of a sample of 2t + 1 elements, the first three schemes make asymptotically 
n/6 swaps for t = 0, n/5 for t = 1, etc. (cf. ^3.3.ip . while Lomuto's scheme takes (n — l)/2; 
the swap counts are similar when the pivot is Tukey's ninther |BeM93[ fCHT02. Dur03j. 

When equal keys occur, one may prefer a ternary scheme which produces a left array 
with keys < v and a right array with keys > v , instead of < v and > v as do binary schemes. 
Here only the Bentley-Mcllroy scheme [BeM93 looks competitive, since Dijkstra's "Dutch 
national flag" scheme |Dij 76[ Chap. 14] and Wegner's schemes |Wcg85| are more complex. 
However, the four schemes discussed above also have attractive ternary versions. Our first 
scheme omits pointer tests in its key comparison loops, keeping them as fast as possible. 
Our second scheme improves on another scheme of Sedgewick |Sed98[ Chap. 7, quicksort] 
(which needn't produce true ternary partitions; cf. £ J5.2|) . Our third scheme is a simple 
modification of the Bentley-Mcllroy scheme which makes n — 1 comparisons; the original 
version takes n — 1/2 on average (cf. Lem. I5.1|) . although n — 1 was assumed in |Dur03j . 
Ternary versions of Lomuto's scheme seem to be less attractive. When many equal keys 
occur, the Bentley-Mcllroy scheme tends to make fewer swaps than the other schemes, 
but it may swap needlessly equal keys with themselves and its inner loops involve pointer 
tests. Hence we introduce hybrid two-phase versions which eliminate vacuous swaps in the 
first phase and pointer tests in the second phase. 

Ternary schemes, although slower than their simpler binary counterparts, have at least 
two advantages. First, quicksort's recursive calls aren't made on the equal keys isolated by 
partitioning. Second, quickselect can identify all keys equal to the /cth smallest by finding 
two indices k_ < k < k + such that x[l: k_ — 1] < Xk = x[k_: k + ] < x[k + + 1: n] on output. 

Our fairly extensive computational tests with quickselect (we left quicksort for future 
work) were quite suprising. First, the inclusion of pointer tests in the key comparison 
loops didn't result in significant slowdowns; this is in sharp contrast with traditional 
recommendations |Knu981 Ex. 5.2.2-24], |Sed781 p. 848], but agrees with the observation 
of |BeM93j that Knuth's MIX cost model needn't be appropriate for modern machines. 
Second, the overheads of ternary schemes relative to binary schemes were quite mild. 
Third, Lomuto's binary scheme was hopeless when many equal keys occured, since its 
running time may be quadratic in the number of keys equal to the kth smallest. 

More information on theoretical and practical aspects of quicksort and quickselect can 
be found in |BeS971 IHm99l IHwT02l IKMP971 IMaMTl IMus971 IValOOl and references therein. 

The paper is organized as follows. The four bipartitioning schemes of interest are 
described in §21 and their average-case analysis is given in In £0] we present tuned 
versions (cf. MaROl, §7]) for the case where the pivot is selected from a sample of several 
elements. Tripartitioning schemes are discussed in §5] Finally, our computational results 
are reported in £0 



2 Bipartitioning schemes 

Each invocation of quicksort and quickselect deals with a subarray x[l: r] of the input array 
x[l: n]; abusing notation, we let n := r — I + 1 denote the size of the current subarray. It is 
convenient to assume that the pivot v := x\ is placed first (after a possible exchange with 
another element). Each binary scheme given below partitions the array into three blocks, 
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with x m < v for I < m < a, x m = v for a < m < b, x m > v for b < m < r, / < a < b < r. 
We suppose that n > 1 (otherwise partitioning is trivial: set a :=b := I). 

2.1 Safeguarded binary partition 



Our first modification of the classic scheme of Sedgewick |Knu98t §5.2.2, Algorithm Q] 
proceeds as follows. After comparing the pivot v := x\ to x r to produce the initial setup 



X = V 


X < V 


? 


X > V 


X = V 



I pi j q r 

with i := I and j := r, we work with the three inner blocks of the array 



(2.1) 



x = v 



X < V 



X > V 



X = V 



I p i j q r 

until the middle part is empty or just contains an element equal to the pivot 



(2.2) 



x = v 



X < V 



X = V 



X > V 



X = V 



(2.3) 



1 or j 



I p j i q r 

i — 2), then swap the ends into the middle for the final arrangement 



x < v 



X = V 



X > V 



(2.4) 



Scheme A (Safeguarded binary partition). 

IATL. [Initialize.] Set i :— I, p :— i + 1, j := r and q :— j — 1. If v > Xj, exchange Xi <-> Xj 
and set p := i; else if v < Xj, set q := j. 

lA"b. [Increase i until x, > v.] Increase i by 1; then if Xj < f , repeat this step. 

lA~t>. [Decrease j until Xj < v.] Decrease j by 1; then if Xj > v, repeat this step. 

llOt. [Exchange.] (Here Xj < v < Xi.) If % < j, exchange Xi <-> Xj and return toEt- If 
z = j (so that Xj = Xj = v), increase i by 1 and decrease j by 1. 

lA~b. [Cleanup.] Set a := / + j — p + 1 and &:=r — q + i — 1. If Z < p, exchange xi <->• Xj. 
If g < r, exchange Xj ^> x r . 

StepEjL ensures that Xi < v < xj, so steps and 1X5 don't need to test whether i < j. 
In other words, while searching for a pair of elements to exchange, the previously sorted 
data (initially, xi < used to bound the search, and the index values are compared 

only when an exchange is to be made. This leads to a small amount of overshoot in the 
search: in addition to the necessary n — 1 comparisons, scheme \K\ makes two spurious 
comparisons or just one (when i = j + 1 or i = j at respectively). Step IXfa makes at 
most n/2 index comparisons and at most n/2 — 1 swaps (since j — % decreases at least by 2 
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between swaps); thus|XlL andlXJl make at most n/2 swaps. To avoid vacuous swaps, step 
IXb" may use the tests / < mm{p, j} and max{g, i} < r; on the other hand. IXb could make 
unconditional swaps without impairing (j2.4|) . 

Of course, scheme |X] could be described in other equivalent ways. For instance, and 
IXh can be written in terms of binary variables %\ :— p — I and i r := r — q; then [35 may 
decrease j by 1 if ii = 1 and increase i by 1 if i r = 1 to have a = j 1 + 1, b — i — 1 in ([2.4)1 . 

A more drastic simplification could swap X/ <-> x r if t> > x r at [X]L, omit the second 
instruction of IXb. set a := b := j at [2E> and swap X\ <-» Xj if Xi = v, Xj <-> x r otherwise. 

2.2 Single-index controlled binary partition 

It is instructive to compare scheme 1X1 with a popular scheme of Sedgewick [BeM93, Progs. 
3 and 4], based on the arrangements (|2.2j) - (|2.3jl with p := I + 1, q := r. 

Scheme B (Single-index controlled binary partition). 

iBlL. [Initialize.] Set i := I and j := r + 1. 

lBl2. [Increase i until X{ > v.] Increase i by 1; then if i < r and xi < v, repeat this step. 
iBfe. [Decrease j until Xj < v.] Decrease j by 1; then if Xj > v, repeat this step. 
iBkt. [Exchange.] (Here Xj < v < Xj.) If i < j, exchange Xi <-> Xj and return tolBb. 
iBfe. [Cleanup.] Exchange xi <-> Xj. 

The test i < r of stepEt is necessary when t> is greater than the remaining elements. If 
% = j at Eli) a vacuous swap is followed by one or two unnecessary comparisons; hence Ell 
may be replaced by Ell to achieve the same effect at no extra cost. With this replacement, 
scheme |B] makes n + 1 comparisons or n if i = j or i = r + 1 at fBlf. and at most (n + l)/2 
index comparisons and (n — 1)/2 swaps at Ell- Usually scheme IB1 is used as if a := b := j in 
(12.4)1 , but in fact E^ m ay set a := j , b := i — 1 (note that the final arrang ement of [BeM93. 



p. 1252] is wrong when j — i — 2). Therefore, from now on, we assume that scheme El 
incorporates our suggested modifications of steps Ell andlBb. 

2.3 Double-index controlled binary partition 

The following scheme compares both scanning indices i and j in their inner loops. 
Scheme C (Double-index controlled binary partition). 
OL. [Initialize.] Set i := I + 1 and j := r. 

ICb. [Increase i until Xi > v.] If i < j and Xi < v, increase i by 1 and repeat this step. 

IUb. [Decrease j until Xj < v.] If i < j and Xj > v, decrease j by 1 and repeat this step. 
If i > j, set j :— % — 1 and go toO). 

[Ub. [Exchange.] Exchange x-i Xj, increase i by 1, decrease j by 1 and return toO?. 

IUb. [Cleanup.] Set a :— b :— j. Exchange X; <->• x^-. 

Thanks to its tight index control, scheme [O m akes just n — 1 comparisons and at most 
(n — l)/2 swaps atO:. Suprisingly, we have not found this scheme in the literature. 



4 



2.4 Lomuto's binary partition 

We now consider Lomuto's partition [BeM93, Prog. 2], based on the arrangements 
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X < V 


X > V 


? 
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X < V 


X > V 
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V 




i r 


I 


p 


r 



(2.5) 



Scheme D (Lomuto's binary partition). 

IdIL. [Initialize.] Set i :— I + 1 and p := I. 
02. [Check if done.] If i > r, go to Ell. 

iDfe. [Exchange if necessary.] If x% < v, increase p by 1 and exchange x v <-> Xi. Increase % 
by 1 and return tolDti. 

IDlL [Cleanup.] Set a := b := p. Exchange Xi <-> x p . 

At the first sight, scheme ID1 looks good: it makes just the n — 1 necessary comparisons. 
However, it can make up to n — 1 swaps (e.g., vacuous swaps when v is greater than the 
remaining elements, or n — 2 nonvacuous swaps for x[l: r] = [n — 1, n, 1, 2, . . . , n — 2]). 



2.5 Comparison of bipartitioning schemes 
2.5.1 Swaps for distinct keys 

When the elements are distinct, we have strict inequalities in fl2.2|) - ()2.5|) . j = i — 1 in (|2.3J) 
and a = b in (|2.4|) . Distinguishing Zow; keys x m < t> and high keys x m > v, let t be the 
number of high keys in the input subarray x[l + ha]. Then schemes iBl and IC1 make the 
same sequence of t swaps to produce the arrangement 



X < V X > V 



(2.6) 



before the final swap x\ <->• x a , and their operation is described by the instruction: until 
there are no high keys in x[l + ha], swap the leftmost high key in x[l + ha] with the 
rightmost low key in x[a + 1: r]. Thus schemes |B] and O make just the necessary t swaps. 
Scheme El acts in the same way if x r > v atlATl. If x r < v at \Kj^, let ti be the number of 
low keys in x[a: r]; in this low case, after the initial swap x\ x r , scheme 1X1 makes t\ — \ 
swaps, each time exchanging the leftmost high key in x[l + 1: a — 1] with the rightmost low 
key in x[a: r — 1], to produce the arrangement 



x < v x > v 
1 



(2.7) 



before the final swap x a «-> x r . Since the number of low keys in x[a + 1: r] equals t, we have 
ti = t + 1 if x a < v, otherwise ti = t. Thus, relative to schemes |B] and O scheme 1X1 makes 
an extra swap when both x a and x r are low. Note that schemes El El and never swap 
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the same key twice while producing the arrangements (J2.6j) - (j2.7|) . In contrast, scheme IDl 
may swap the same high key many times while producing the arrangement (|2.6|) (usually 
different from that of El and E])- In fact scheme IDl makes exactly := a — / swaps; this is 
the total number of low keys. Thus the number of extra swaps made by scheme IDl relative 
to El and El ^H]— t, equals the number of low keys in the initial x[l + 1: a]. 

2.5.2 Swaps for equal keys 

When equal keys occur, schemes El El an d O perform similarly to Sedgewick's scheme of 
Sed77l Prog. 1]; in particular, thanks to stopping the scanning pointers on keys equal to 
the pivot, they tend to produce balanced partitions. For instance, when all the keys are 
equal, we get the following partitions: for scheme \K\ a = \_(l+r— l)/2j , b = a+l+(n mod 2) 
after \{n + l)/2] swaps; for scheme El a, = \{l + r)/2] , b = a + 1 — (n mod 2) after [n/2] 
swaps; for scheme O a = b = \{l + r)/2] after [n/2] swaps. In contrast, scheme IDl makes 
no swaps, but yields a = b = I, the worst possible partition. 

3 Average-case analysis of bipartitioning schemes 

In this section we assume that the keys to be partitioned are distinct and in random order; 
since the schemes depend only on the relative order of the keys, we may as well assume that 
they are the first n positive integers in random order. For simplier notation, we suppose 
that I = 1 and r = n. It is easy to see that when the keys in x[l + l:r] are in random 
order, each scheme of £0 preserves randomness in the sense of producing x[l:a — 1] and 
x[a + 1: r] in which the low and high keys are in random order (since the relative orders of 
the low keys and the high keys on input have no effect on the scheme). 

3.1 Expected numbers of swaps for fixed pivot ranks 

For a given pivot v := x±, let j v denote the number of low keys in the array x[2:n]; then 
a = j v + 1 is the rank of v. Once j v is fixed at j (say), to compute the average number of 
swaps made by each scheme, it's enough to assume that the keys in x[2: n] are in random 
order; thus averages are taken over the (n — 1)! distinct inputs. Our analysis hinges on the 
following well-known fact (cf. |Chv 02j). 

Fact 3.1. Suppose an array x[l:r] contains n := f — / + 1 > distinct keys, of which j 
are low and n — j are high. If all the n\ permutations of the keys are equiprobable, then 
j(n — j)/n is the average number of high keys in the first j positions. 

Proof. List all the h\ key permutations as rows of an n\ x ft matrix. In each column, 
each key appears (n — 1)! times, so the number of high keys in the first j columns is 
j(n — j)(h — 1)!; dividing by h\ gives the average number j(n — j)/h. □ 

Lemma 3.2. Suppose the number of low key equals j. Let Tlpl 7j^, T^M, 7^ denote the 
average numbers of swaps made by schemes |XJ El O o,nd |Dl excluding the final swaps. 
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Then 



TP: 



'Jl- 
ii 

3 



n — 1 

301 = 

j n-V 

70=70= 

j 3 



n 



J 

n — 1 

= 2, 



n > 3, 



n — 1 
J- 



(3.1a) 

(3.1b) 

(3.2) 
(3.3) 



Proof. By assumption, the arrangements ()2.6|) - ()2.7JI involve I = 1, a = j ' + 1, r = n. The 
results follow from suitable choices of I, f, j in Fact 13.1 1 

For scheme assuming n > 3, let I — 2, r — n — 1. Depending on whether x n > i> or 
x„ < v, scheme 1X1 produces either fl2.6j) or ()2.7j) from the initial configurations 
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X > V 


1 


a 


n 



or 









X < V 


1 


a 


n 



(3.4) 



For x n > v, take j = j = a — 1; then the average number of high keys in x[2: a] (i.e., of 
swaps) equals j(n — 2 — j)/(n — 2). For x n < v, take j = j — 1; in this case tj — 1, the 
number of low keys in x[a: n — 1], equals the number of high keys in x[2: j], so the average 
value of ti equals (J — 1) (n — 1 — j ) /(n — 2) + 1 . Since there are j low keys and n — 1 — j high 
keys which appear in random order, we have x n > v with probability (n — 1 — j)/(n — 1) 
and x n < v with probability j /(n — 1). Adding the contributions of these cases multiplied 
by their probabilities yields (|3.1a|) . For n = 2, [Xfr makes 1 swap if j = 1, otherwise, so 
(l3lb)l holds. 

For schemes El and O take I — 2, r — n, j — j to get ()3.2j) in a similar way. 
Since scheme IDl makes := a — I = j swaps, ()3.3|) follows. □ 



To compare the average values (|3.1|) - ([3.3|h note that we have < j < 



n 



7 Ei =7 mi 



+ 



jU - 1) 



and 



if n > 3, 



(3.5) 



(n-l)(n-2) 3 (n-l)(n-2) 

30 = o and 30 = tP = j if n = 2. Thus 3^ < 30+ 1 (with equality iff there are 

no high keys), whereas 30is much greater than 3^1 when there are relatively many low 
keys. 



3.2 Bounding expected numbers of swaps for arbitrary pivots 

From now on we assume that the pivot is selected by an arbitrary rule for which (once the 
pivot is swapped into X\ if necessary) each permutation of the remaining keys is equiprob- 
able. Let Tj^j, Tj^j, 3£q] denote the average numbers of swaps made by schemes El El 
O and El excluding the final swaps. Of course, these numbers depend on details of pivot 
selection, but they can be bounded independently of such details. To this end we compute 
the maxima of the average values ()3.1|) - ()3.3j) . 
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Lemma 3.3. Let 2^, T^ x , 7§L 7^ denote the maxima of 201, 20 20 20 

< j < n. T/ien 

n (n — 5)(n mod 2) 




max max 



if 71 > 5, 

4 4(n-l)(n-2) " (3.6) 

1 if n < 4, 

n — 1 (n + 1) mod 2 

~~4 4(n-l) ' (3 ' 7) 

2@ x = n-l. (3.8) 

Proof. The maximum of (|3.1|) is attained at j = [n/2] if n > 4, j = n — 1 otherwise. The 
maximum of ()3.2|) is attained at j = [n/2\. The rest follows by simple computations. □ 

Corollary 3.4. The average numbers of swaps Tj^j, T j^j , Tj^j, 2^ made 6y schemes EJ El 

El O osre at most 2^ x , 2^ x , 2^ x; 2^ x for the values given in ()3.6j) - (j3.8j) . In particular, 
Tj^j, and are at most n/4 /or n > 3. 

3.3 The case where pivots are chosen via sampling 
3.3.1 Pivots with fixed sample ranks 

We assume that the pivot v is selected as the (p + l)th element in a sample of size s, 
< p < s < n. Thus p and q := s — 1 — p are the numbers of low and high keys in the 
sample, respectively. Recall that v has rank j v + 1, where j v is the total number of low 
keys. We shall need the following two expected values for this selection: 

Ej. = E(n, s,p) := (p + l)(n + l)/(s + 1) - 1, (3.9) 
M*- 1 -*)] = r (n , ., p) : = (P+ !)(«+!)(" +D(" + 2) _ (3 . 10) 

71-1 J V 7 (s + l)(s + 2) 71-1 71-1 V ^ 

Here @S| follows from [F1R751 Eq. (10)] and (|3~TU|) from the proof of jMaROll Lem. 1]. 

Theorem 3.5. For E(n,s,p) and T(n,s,p) given by (j3.9|) - (j3.1()j) . the average numbers 
of swaps Tj^j, Tj^j, 2^ made 6?/ schemes El El are egwa/ to, respectively, 

maxin — 3,0} 1 

2hn(n,s,p) = 7 — T T(n,s,p) + -£?(n,s,p), (3.11) 

maxjn — 2, 1} n — 1 

^jsfX^P) = 2j^w,s,p) = T(n,s,p), (3.12) 
2^(n,s,p) = E(n,s,p). (3.13) 

Proof. Take expectations of the averages (|3.1j) - ([3.3|) conditioned on j v = j, and use 
()3.9|) - (j3.10|) : the two "max" operations in (|3.1H1 combine the cases of n = 2 and n > 3. D 



E 
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The average values (|3.11|) - ()3.13|) may be compared as follows. First, in the classic case 
of s = 1 (p = q = 0), we have = n/6 if n > 3 (else = 1/2), = (n — 2)/6, 
7jp|= (n — l)/2; thus scheme IDl makes about three times as many swaps as IXj IBl and lO 

Second, for nontrivial samples (s > 1) one may ask which choices of p are "good" or 
"bad" with respect to swaps. For schemes El arid El the worst case occurs if p is chosen to 
maximize (J3.1U)) (where q + 1 = s — p); we obtain that for all < p < s, 

m , \ / .(n + l)(n + 2) n n — 1 , . . s + 1 
T(n,s,p)< K{ s) [ J\ ' - — <— with <s)^ WTTy (3.14) 

where the first inequality holds as equality only for the median- of-s choice of p = (s — 1)/2, 
and the second one iff s — n. Since < 1, ()3.14|1 yields < {n + 3)/4, but we 

already know that < n/A (Cor. 13.4)1 . For any median-of-s choice with a fixed s, T j^] 
and are asymptotically n(s)n, whereas E(n,s,p) = (n — l)/2; thus scheme IDl makes 
about 1/2k(s) > 2 times as many swaps as 1X1 IBl and IU1 fwith k(3) = 1/5, k(5) = 3/14, 
k(7) = 2/9, k(9) = 5/22). On the other hand, for the extreme choices of p — or p — s — 1 
which minimize (I3.1U)) (then v is the smallest or largest key in the sample), 2^] and 
are asymptotically ns/(s + l)(s + 2), whereas is asymptotically n/(s + 1) for p = 
and ns/(s + 1) for p = s — 1. Thus scheme IDl can't improve upon 1X1 and IBl even for the 
choice of p = which minimizes (|3.9|) . 

3.3.2 Pivots with random sample ranks 

Following the general framework of CHT02 , §1], suppose the pivot v is selected by taking 
a random sample of s elements, and choosing the (p + l)th element in this sample with 
probability tt p , < p < s, Sp=o 7r p = 1- I n other words, for p v denoting the number 
of low keys in the sample, we have Vi[p v — p] — n p . Hence, by viewing ()3.9|) - (j3.13|) as 
expectations conditioned on the event p v = p, we may take total averages to get 

Ej v = E [E(n, s, Pv )} = E{n, s) := {Ep v + l){n + l)/(s + 1) - 1, (3.15) 
= E[T(n,s,p v )] = T(n,a):= J2 «P T ( n , s >P)> ( 3 - 16 ) 

0<p<s 

and the following extension of Theorem 13.51 

Theorem 3.6. For E(n,s) and T(n,s) given by ()3.15|) - 1)3.16)) . the average numbers of 
swaps T j^j , T j^ T j^ made by schemes |X[ El O Q,re equal to, respectively, 

max{n-3,0} 1 
W n, S ) = max{n _ 2;1} T(n, S ) + _£(n, (3.17) 

^g(n, S )=^n, S )=T(n, S ), (3.18) 
3jjj(n,s) =E(n,s). (3.19) 



E 



j„(n - 1 -j v ) 



n 
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Note that in (j3.15J) - (j3.16J) . we have Ep v = J2o<p<s ^pV < s ~ 1 an d 

rpi \ w v(n + l)(n + 2) n v , , x - (p+l)(s-p) 

Tin, s) = k(s) with k(s) :— > n^-, — -, 3.20 

where k(s) < k(s) (cf. (|3.14j) ). and H(s) = k(s) iff tt p = 1 for p = (s — l)/2. Thus again 
2^] and 2^] are asymptotically k(s)n, whereas Tjpj] can be much larger. 

As an important example, we consider Tukey's ninther, the median of three elements 
each of which is the median of three elements BcM93j. Then s = 9 and tt p = except for 
7T 3 = vr 5 = 3/14, 7T 4 = 3/7 |(]HT02llDu703] . so E(n, 9) = (n - l)/2 and k(9) = 86/385 « 
0.223. Thus, when the ninther replaces the median-of-3, 2fan and 2^] increase by about 12 
percent, getting closer to n/4, whereas Tjpj] stays at (n — lJ/2. 



4 Using sample elements as sentinels 

The schemes of §2] can be tuned [MaROU §7.2] when the pivot v is selected as the (p + l)th 
element in a sample of size s, assuming < p < s < n and q := s — 1— p > 0. 

First, suppose the p sample keys < v are placed first, followed by v, and the remaining 
q sample keys > v are placed at the end of the array x[/:r]. Then, for I := I + p and 
f := r — q, we only need to partition the array f] of size n := n — s + 1. The schemes 
of £|2 are modified as follows. 

In step EJf of scheme |XJ set % := I and j := f + 1; in step set a := j , b := i — 1 and 
exchange X[ «-» Xj. This scheme makes n + 1 comparisons, or just n if % = j at lAtL. The 
same scheme results from scheme iBl bv replacing I, r with I, f, [Bjl withlATi. and omitting 
the test u i < r" in[Bj2. Similarly, /, f replace Z and r in schemes IU1 and iDl which make 
n — 1 comparisons. 

To extend the results of J2to these modifications, note that for n = 1 these schemes 
make no swaps except for the final ones. For n > 1, schemes \K\ IBl and IU1 swap the same 
keys, if any. Therefore, under the sole assumption that the keys in x[l + l:f] are distinct 
and in random order, Lemma I3~2l holds with (|3.1j) - ([3.3|) replaced by 

^\ = M=M = 0-p)(n-l-q-j) ^ ^ • (4.1) 

using I = l+l, f = f, j = j —p in Fac t 13 .11 further, Lemma l3~3l and Corollary 13 .41 hold with 
n replaced by ft, (JSZHJ) omitted and lj§T = ^S x in (EH)- Next, (l3~H|l - (l3~T0l are replaced 
by 

Ej v -p = E(n,8,p) := (p + l)(n - s)/(s + 1), (4.2) 
(jv ~ P)(n - 1 - q - j v )] , (p+l)(g + l) 

=T(n,s,p):= ; (n-s-1), s < n, 4.3 

n — s (s + l)(s + 2) 

where (@~3J) is obtained similarly to (j3~TTl|) [MaRQj] §7-2]. In view of ()4,l)) - (j4.3j) . Theorem 
13.51 holds with E{n,s,p), T{n,s,p) replaced by E(n,s,p), T{n,s,p), (j3.11j) omitted and 
3j^](n, s,p) = Tj^n, s,p) in (|3.12|1 . Finally, (|3.14|) is replaced by 

n — s — 1 ^ — (— 1 

T(n,s,p)<K(s)(n-s-l)< with n(s) := 4(g + 2) , (4-4) 
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Figure 4.1: Decision tree for median of three 



where the equality holds iff p — (s — l)/2, in which case E(n, s,p) = (n — s)/2. 

Randomness may be lost when the sample keys are rearranged by pivot selection, but 
it is preserved for the median-of-3 selection with p = q = 1. Then the sample keys usually 
are xi, Xz+i, x r (after exchanging xi + i with the middle key xy(i +r )/2\)- Arranging the sample 
according to Figure l4~T1 takes 8/3 comparisons and 7/6 swaps on average for distinct keys. 
(These counts hold if, for simpler coding, only the left subtree is used after exchanging 
a <-> c when a > c; other trees BcM93, Prog. 5] take 3/2 swaps for such simplifications.) 

Even if pivot selection doesn't rearrange the array (except for placing the pivot in x{), 
scheme 1X1 may be simplified: in step |X]L, set i := I and j := r + 1; in step |XJ) set a := j, 
b := i — 1 and exchange xi «-> Xj. The same scheme results from scheme IBlby replacing 
Will with U\h. and omitting the test u i < r" in^^. This simplification is justified by the 
presence of at least one key > v in x[l + 1: r], which stops the scanning index i. Hence 
the results of gSl remain valid (with (jSH), IpTTijl . (j3~Hjl . (jSHJ), (!3~T7jl omitted, 70=7© 
in (HI, = ^ fl£ZJ, = ^n^.p) in (JSH, 2^n, a) = ^(n, s) in 

(mi). 



5 Tripartitioning schemes 

While bipartitioning schemes divide the input keys into < v and > v, tripartitioning 
schemes divide the keys into < v, = v and > v. We now give ternary versions of the 
schemes of ^21 using the following notation for vector swaps (cf. BeM93j). 

A vector swap denoted by x[a: b] <-> a;[6+l: c] means that the first d := min(6+l— a, c—b) 
elements of array x[a: c] are exchanged with its last d elements in arbitrary order if d > 0; 
e.g., we may exchange x a+ i <-> x c _j for < i < d, or x a+ i <-> x c _d+i+j for < i < d. 

5.1 Safeguarded ternary partition 

Our ternary version of scheme IA1 employs the following "strict" analogs of (|2.2|) - (j2.4j) : 



X = V 


X < V 


? 


X > V 


X = V 


I 


V 


i j 


Q 


r 



(5.1) 
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X = V 


X < V 
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X > V 


X = V 
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p j 






% q 


r 




X < V 


X 


= V 


X > V 






I 


a 


b 


r 





(5.2) 
(5.3) 



Scheme E (Safeguarded ternary partition). 

lElL. [Initialize.] Set i :— I, p :— i + 1, j := r and q :— j — 1. If v > Xj, exchange Xi <-> Xj 
and set p := i; else if v < Xj, set q := j. 

lEfe. [Increase i until Xj > v.] Increase i by 1; then if Xi < v, repeat this step. 

lEfe. [Decrease j until Xj < v.] Decrease j by 1; then if Xj > v, repeat this step. 

[Ekt. [Exchange.] (Here Xj < v < X{.) If i < j, exchange Xi Xj] then if Xi = v, exchange 
and increase p by 1; if Xj = exchange <-> x g and decrease q by 1; return 
to|E^. If i = j (so that Xj = Xj = v), increase % by 1 and decrease j by 1. 

IEb. [Cleanup.] Set a :— I +j — p+ 1 and b :— r — q + i — 1. Exchange x[/: p — 1] <-> x[p: j] 
and x[i: g] <-> x[q + l:r]. 

Similarly to scheme El scheme lEl makes n or n + 1 key comparisons, and at most n/2 
index comparisons at|EJi. Let n<, n = , n> denote the numbers of low, equal and high keys 
(here j — p + 1, b — a + 1, q — 2 + 1). Step |E|4 makes at most n/2 — 1 "usual" swaps 
X{ <-> Xj, and n= — 1 or n= — 2 "equal" swaps when x% = v or Xj = f . Step |E^ makes 
min{p — I, n<} + min{r — g, ra>} swaps; in particular, at most min{n = , n < + n>} swaps. 

5.2 Single-index controlled ternary partition 

Our ternary version of scheme El also employs the arrangements (|5.1j) - (|5.2|) . 

Scheme F (Single-index controlled ternary partition). 

IF!!.. [Initialize.] Set i := Z, p := z + 1, j := r + 1 and q := j — 1. 

IFb. [Increase i until Xj > v.] Increase i by 1; then if z < r and x, < v, repeat this step. 

IFb. [Decrease j until Xj < v.] Decrease j by 1; then if Xj > v, repeat this step. 

iFlt. [Exchange.] (Here Xj < v < Xj.) If i < j, exchange x« <-> Xj] then if Xj = f , exchange 
Xj <-> x p and increase p by 1; if x^ = v, exchange Xj <->• x g and decrease g by 1; return 
to|Ff!. If i = j (so that Xj = Xj = u), increase i by 1 and decrease j by 1. 

IFb. [Cleanup.] Set a := Z + j — p+ 1 and b := r — q + i — 1. Exchange x[/: p — 1] <-> x[p: j] 
and x[z: g] <-> x[q + 1: r]. 

The comparison and swap counts of scheme |F] are similar to those of scheme in 
particular, step|FJ) makes min{p — + min{r — q, n>} swaps, where p — l + r — q = n = 

or n= — 1. In contrast, a similar scheme of Sedgewick Sed98 , Chap. 7, quicksort] swaps all 
the n= equal keys in its last step. More importantly, Sedgewick's scheme needn't produce 
true ternary partitions (e.g., for x = [0, 1, 0] and v = 0, it doesn't change the array). 
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5.3 Double-index controlled ternary partition 

We now present our modification of the ternary scheme of [BcM93j, described also in 
BeS97, Prog. 1] and |Knu98| Ex. 5.2.2-41]. It employs the loop invariant (|5.1jl . and the 
cross-over arrangement (|5.2|) with j = i — 1 for the swaps leading to the partition (j5.3)l . 

Scheme G (Double-index controlled ternary partition). 

lUll. [Initialize.] Set i :— p :— I + 1 and j := q := r. 

lUfe. [Increase z until Xj > u.] If i < j and Xj < v, increase z by 1 and repeat this step. If 
i < j and Xj = v, exchange x p «-> x^ increase p and z by 1, and repeat this step. 

lUfe. [Decrease j until Xj < v.] If z < j and x,,- > v, decrease j by 1 and repeat this step. 
If i < j and Xj = v, exchange Xj «-> x g , decrease j and g by 1, and repeat this step. 
If i > j, set j := i — 1 and go toO). 

lUb. [Exchange.] Exchange Xi <-» Xj, increase z by 1, decrease j by 1, and return to [Up!. 

iGtS. [Cleanup.] Set a := I + i — p and b := r — q + j . Swap x[/:p — 1] «-> x[p: j] and 
g] <-> x[g + 1: r]. 

Steps andlGl3 make n= — 1 swaps, step 01 at most min{n < ,n > } < (n — 1)/2 swaps, 
and step [Ub takes min{p — Z, n<} + min{r — g, ra>} < min{ra = , ra< + n>} swaps. 

Scheme O makes n — 1 comparisons, whereas the versions of |ReM93l Progs. 6 and 7], 
[BeS97, §5], [Knu98, Ex. 5.2.2-41] make one spurious comparison when i = j at step 03. 
These versions correspond to replacing step 03 by 

lUtT. [Decrease j until < v.] If i < j and Xj > u , decrease j by 1 and repeat this step. 
If i < j and Xj = v, exchange Xj <-» x g , decrease j and g by 1, and repeat this step. 
If i > j, gotoEt- 

Except for making a spurious comparison when i = j, step 05' acts like 03: If i = j, 
then, since x% > v by[G^, they exit to03 with j — i — 1, whereas if i > j, then the general 
invariant i < j + 1 yields z = j + 1, and 03 maintains this equality. 

Lemma 5.1. Let c G {0,1} be the number of spurious comparisons made by scheme IU1 
using step0V. If the keys are distinct and in random order, then E[c\j v = j] = (n— 1) 
for < j < n, and Ec = l — Ej v /(n — l), where j v is the number of keys < v. In particular, 
Ec = 1/2 when the pivot v is the median-of-s (for odd s > 1) or the ninther (cf. ^3.3|) ; in 
these cases scheme RjI with step makes on average n— 1/2 comparisons. 

Proof. For distinct keys, the final i — a + 1 and j = a at step 05. If c = 1, then i = j 
and Xi > v at 05' yield i = a + 1 < n and x a +i > v. Conversely, suppose a < n and 
x a+ \ > v on input. If x a+ i were compared to v first at 05' for j = a + 1 > i, 05' would 
set j = a and exit to 05 (since 01 would decrease j below a) with i < a, a contradiction; 
hence x a+ i must be compared to t> first at 0! for z = a + 1 < j, and again at 03'. Thus 
c = 1 iff a < n and x a+ i > v on input. Consequently, for j v := a — 1 = j < n — 1, 
E[c|j„ = j] = Pr[x a+ i > ■uIj,, = j] = (n — 1 — j) /(rz — 1) since there are rz — 1 — j high keys 
in random order, and E[c|j„ = n — 1] = 0; the rest is straighforward. □ 
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5.4 Lomuto's ternary partition 

Our ternary extension of scheme O employs the following "strict" version of ()2.5j) : 



X — V 


X < V 


X > V 


? 




X = V 


X < V 


X > V 


I p 


V 




i r 


I p 


p 


r 



(5.4) 



Scheme H (Lomuto's ternary partition). 

[HP.. [Initialize.] Set i :— I + 1 and p := p := I. 
|H]2. [Check if done.] If i > r, go to Eli. 

iHfe. [Exchange if necessary] If Xi < v , increase p by 1 and exchange x p <-> Xj. If Xj = t> , 
increase p and p by 1 and exchange x p <-> Xj and Increase i by 1 and return 

toEfe. 

IH14. [Cleanup.] Set a := I + p — p and 6 := p. Exchange x[l: p) x[p + hp]. 

Scheme |H] makes n< + 2(n = — 1) + min{n = ,ri < } swaps. Using the arrangements 



X < V 


X = V 


X > V 
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X < V 


X = V 


X > V 


I p 
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I p 
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r 



with obvious modifications, scheme iHl would make n= — 1 + 2n < swaps. 
5.5 Comparison of binary and ternary schemes 

When the keys are distinct, the binary schemes IX j lB | IC j iDl are equivalent to their ternary 
versions lE j IF j IU | IHl in the sense that respective pairs of schemes (e.g.. IXland lEjl produce 
identical partitions, making the same sequences of comparisons and swaps. Hence our 
results of extend to the ternary schemes by replacing |XJ El O O with lEl IFl O IHl 
respectively. Since the overheads of the ternary schemes are relatively small, consisting 
mostly of additional tests for equal keys, the ternary schemes should run almost as fast as 
their binary counterparts in the case of distinct keys. 

Let us highlight some differences when equal keys occur. Although schemes \K\ and [E] 
stop the scanning pointers i and j on the same keys, step lAll simply swaps each key to 
the other side, whereas step Ell additionally swaps equals to the ends. Schemes El and 
IF1 behave similarly. However, in contrast with scheme O scheme O never swaps equals 
to the other side. For instance, when all the keys are equal, scheme lEl makes \n/2 — lj 
usual swaps and 2[n/2 — lj vacuous swaps, scheme El makes [(n — 1)/2J usual swaps and 
2\_(n — 1)/2J vacuous swaps, scheme 101 makes just n — 1 vacuous swaps, and scheme El 
makes 2(n — 1) vacuous swaps. 
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5.6 Preventing vacuous swaps of equal keys 

Steps 02 and 03 of scheme IH1 have two drawbacks: they make vacuous swaps when i = p 
and j = q, and they need the tests "i < f and u i < f . These drawbacks are eliminated 
in the following two-phase scheme, which runs first a special version of scheme that 
doesn't make vacuous swaps until it finds two keys Xj < v < Xj. Afterwards no vacuous 
swaps occur (because p < i, j < q) and the pointer tests are unnecessary (since Xj > v 
stops the i-loop, and Xj_i < v stops the j-loop). 

Scheme I (Hybrid ternary partition). 

JJl. [Initialize.] Set i :— I + 1 and j := q := r. 

[Ij2. [Increase % until X{ ^ v.] If i < j and x, = v, increase i by 1 and repeat this step. 
Set p := i. If i = j, set i :— j + 1 if Xi < v, j :— % — 1 otherwise. If i > j, go to [HL2. 

Ut?. [Decrease j until Xj ^ v.] If i < j and Xj = v, decrease j by 1 and repeat this step. 
Set q := j. If i = j, set i := j + 1 if Xi < v, j := i — 1 otherwise, and go to [UL2. 

HJ4. [Decide which steps to skip.] If Xi < v and Xj < v, go to|T]5. If Xj > v and Xj > v, go 
to Ht3. If Xi > v and Xj < v, go to HJ? 7 . If Xi < v and Xj > v, go to|T^. 

H^. [Increase i until Xj > v.] Increase i by 1. If « < j and Xj < u, repeat this step. If 
i < j and Xj = u, exchange x p <-> Xj, increase p by 1, and repeat this step. (At this 
point, Xj < v.) If i < j, go to HIT 7 . Set i := j + 1 and go to HJL2. 

Hl5. [Decrease j until Xj < v.] Decrease j by 1. If i < j and Xj > v, repeat this step. If 
% < j and Xj = v, exchange Xj <-> x q , decrease q by 1, and repeat this step. (At this 
point, Xi > v.) If i = j, set j :— i — 1 and go to|Ul2. 

H]r. [Exchange.] (At this point, i < j and Xi > v > Xj.) Exchange Xj <-> x^. 

H^. [End of first stage.] (At this point, Xi < v < Xj and p < i < j < q.) 

H^. [Increase i until Xj > u.] Increase i by 1. If x, < f, repeat this step. If Xj = v, 
exchange x p <-> Xj, increase p by 1, and repeat this step. 

UlLO. [Decrease j until Xj < u.] Decrease j by 1. If Xj > v, repeat this step. If Xj = v, 
exchange Xj <-> x g , decrease q by 1, and repeat this step. 

UlLl. [Exchange.] If « < j, exchange Xj <-> x^ and return toQt). 

Hll2. [Cleanup.] Set a := I + i — p and b := r — q + j. Exchange x[i.p — 1] <-> x[p: j] and 
x[i: g] <-»• x[g + 1: r]. 

Scheme H] makes n + 1 comparisons, or just n — 1 if it finishes in the first stage before 
reaching step QJ). The two extraneous comparisons can be eliminated by keeping the 
strategy of scheme in the following modification. 

Scheme J (Extended double-index controlled ternary partition). 

Use scheme H] with steps through HJLI replaced by the following steps. 

Hfc. [End of first stage.] Increase i by 1 and decrease j by 1. 

HJ). [Increase i until Xj > v.] If % < j and Xj < v, increase i by 1 and repeat this step. If 
i < j and x« = v, exchange x p <-> Xj, increase p and i by 1, and repeat this step. 
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[TjLO. [Decrease j until Xj < v.] If i < j and Xj > v, decrease j by 1 and repeat this step. 
If % < j and Xj = v, exchange Xj «-> x q , decrease j and q by 1, and repeat this step. 
If i > j, set j :— i — 1 and go to[DL2. 

HJLI. [Exchange.] Exchange Xi Xj, increase i by 1, decrease j by 1, and return tollf). 

Schemes H] and |J] are equivalent in the sense of producing identical partitions via the 
same sequences of swaps. Further, barring vacuous swaps, scheme O is equivalent to 
schemes HI and [U in the following cases: (a) all keys are equal; (b) x r ^ v (e.g., the keys are 
distinct); (c) there is at least one high key > v. In the remaining degenerate case where 
the keys aren't equal, x r = v and there are no high keys, scheme IH1 produces i = r + 1 and 
j = r on the first pass, whereas stepUB finds j < r, and either |1J5 or|l]5 produce i = j + 1 
(i.e., scheme O swaps r — j more equal keys to the left end). 

If the first stage of schemes H] and [U is implemented by a more straightforward adaptation 
of scheme [HI we obtain the following variants. 

Scheme K (Alternative hybrid ternary partition). 

Use scheme H] with steps through HJ) replaced by the following steps. 

HJ2. [Increase i until Xi ^ v.] If i < j and Xi = v, increase i by 1 and repeat this step. 
Set p := i. If i < j and Xi < v, increase i by 1 and go to|U3; otherwise go to HJl. 

HE$. [Increase i until > v.] If i < j and Xi < v, increase % by 1 and repeat this step. If 
i < j and x-i = v, exchange x p <-> Xj, increase p and i by 1, and repeat this step. 

Ull. [Decrease j until Xj ^ v.] If i < j and Xj = v, decrease j by 1 and repeat this step. 
Set q := j. If i < j and Xj > v, decrease j by 1 and go to|TJ). If i < j and Xj < v, go 
to HP 7 . Set j :=i-l and go to|I|L2. 

HJ5. [Decrease j until Xj < v.] If i < j and Xj > v, decrease j by 1 and repeat this step. 
If i < j and Xj = v, exchange Xj x q , decrease j and q by 1, and repeat this step. 
If i > j, set j := i — 1 and go to HJ12. 

Scheme L (Two-stage double- index controlled ternary partition). 

Use scheme |U with steps Hf! through HfS replaced by steps Hf! through |TJ) of scheme El and 
steps U through HlLl replaced by steps U through HlLl of scheme U 

In other words, scheme |L] is obtained from scheme O by using special versions of steps 
Id? and[H^ on the first pass, with each step split into two substeps to avoid vacuous swaps. 

Except for avoiding vacuous swaps, schemes |K] and |E] are equivalent to scheme |H] 
Hence schemes |HJ HI QJ |K] and |E] are equivalent except for the degenerate case discussed 
after scheme |J] in this case, schemes U and |J] swap fewer equal keys than schemes iKl and iLl 
Another significant difference between schemes U and |K] is that scheme H] may be quicker 
in reaching the second stage where the tests u i < j" and u i < j" aren't needed. (In fact 
scheme E reaches step U£ faster than scheme |K] iff < v < Xj occurs at step |Ul of scheme 
H] in the remaining three cases of QJi both schemes act equivalently.) 
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Figure 5.1: Decision tree for median of three 
5.7 Using sample elements in tripartitioning 

In parallel with 21 we now show how to tune the ternary schemes when the pivot v is 
selected as the (p + l)th element in a sample of size s, assuming < p < s < n and 
q := s — 1— p > 0. 

First, suppose that after pivot selection, we have the following arrangement: 



X < V 


X = V 
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p 1 
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r 



(5.6) 



with p := I + p + 1, q := r — q; then we only need to partition the array x\p — 1: q] of size 
n := n — s + 1. The ternary schemes are modified as follows. 

In step Ell of scheme IB"! set i := p— 1 and j := q+ 1; in stepEf) replace I, r by I, f. The 
same scheme results from scheme |F] after analogous changes and omitting the test "i < r" 
in|F^. Similarly, in stepO- of scheme O set i := p and j := q; in stepO) replace I, r by I, 
f. Steps Q]L and|l]ll of schemes HI through iLl are modified in the same way. Finally, in step 
ITTt of scheme |H] set i := p and p :— p :— i — 1; in step [Hj! replace r by q; in step Ell set 
a := I +p — p, b :— p — q + f and exchange x[l: p] x[p + l:p] and x[p + 1: g] <-> + l:r]. 

When the keys are distinct, we have I = I + p, p = 1 + 1 and q = f = r — q in ()5.6|1 . so 
that schemes EJ El |H1 El are equivalent to schemes El El O as modified in §U (where p, 
q correspond to the current p, q). 

For the median-of-3 selection (p = q = 1, p — I + 2, q — r — 1), we may rearrange the 
sample keys xi, Xi + \, x r and find /, f according to Figure IB~T1 (For simplicity, as with Fig. 
14. 1| the left subtree may be used after exchanging a <-> c when a > c.) 

As in 21 even if pivot selection doesn't rearrange the array except for placing the pivot 
in xi, scheme El may be simplified by replacing stepElL with stepEJ.; the same scheme is 
obtained from scheme IF1 bv omitting the test "i < r" inlFb. 



6 Experimental results 
6.1 Implemented algorithms 

We now sketch the algorithms used in our experiments, starting with a nonrecursive version 
of quickselect that employs a random pivot and one of the ternary schemes of £0 
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Algorithm 6.1 (Quickselect(x, n, k) for selecting the kth smallest of 
Step 1 (Initialize). Set / := 1 and r := n. 

Step 2 (Handle small file). If I < r, go to Step 3. If / > r, set /c„ := r+ 1 and k + := I — 1. 
If I = r, set k_ := k + := k. Return. 

Step 3 (Select pivot). Pick a random integer i e [i, r], swap <-> Xj and set t> := x\. 

Step 4 (Partition). Partition the array x[l:r] to produce the arrangement (|5.3|) . 

Step 5 ( Update bounds). If a < k, set Z := 6 + 1. If < 6, set r := a — 1. Go to Step 2. 

Steps 2 and 5 ensure that on exit x[l: k- — 1] < x[k-\ k + ] < x[k + + 1: n], k_ < k < k + . 

The median-of-3 version works as follows. If I — r — 1 at Step 2, we swap xi x r if 
Xi > x r , set k_ := / and fc + := r if X/ = x r , := k + := A; otherwise, and return. At 
Step 3, we swap x r with random keys in x[l + 1: r] and x[Z + 2: r), respectively. After 
sorting the sample keys Xi, Xj+i, x r and finding T, f for according to Fig. 15.11 we se t 
v := Then Step 4 uses one of the modified ternary schemes of £ 15.71 

When a binary scheme is employed, we omit k_ and k + , use Fig. 14.11 instead of Fig. 
15. 1[ and the modified schemes of S01with I := I + 1, f := r — 1 for the median-of-3. 

Our implementations of Quickselect were programmed in Fortran 77 and run on a 
notebook PC (Pentium 4M 2 GHz, 768 MB RAM) under MS Windows XP. We used a 
double precision input array x[l:ra], in-line comparisons and swaps; future work should 
test tuned comparison and swap functions for other data types (cf. |BeM93j ). 

6.2 Testing examples 

We used minor modifications of the input sequences of [ValOOj . defined as follows: 
random A random permutation of the integers 1 through n. 

mod-m A random permutation of the sequence i mod m, i = 1: n, called binary (ternary, 
quadrary, quintary) when m = 2 (3, 4, 5, respectively). 

sorted The integers 1 through n in increasing order. 

rotated A sorted sequence rotated left once; i.e., (2, 3, ... , n, 1). 

organpipe The integers (1, 2, ... , n/2, n/2, . . . , 2, 1). 

m3killer Musser's "median-of-3 killer" sequence with n = 4j and k = n/2: 



twofaced Obtained by randomly permuting the elements of an m3killer sequence in po- 
sitions 4|_log 2 nJ through n/2 — 1 and n/2 + 4|_log 2 nJ — 1 through n — 2. 

For each input sequence, its (lower) median element was selected for k := [n/2]. 

These input sequences were designed to test the performance of selection algorithms 
under a range of conditions. In particular, the binary sequences represent inputs con- 
taining many duplicates |Sed77j . The rotated and organpipe sequences are difficult for 




2 3 4 
k+1 3 k+3 



k-2 k-1 k k+1 
2k -3 k-1 2 4 
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many implementations of quickselect. The m3 killer and twofaced sequences are hard for 
implementations with median-of-3 pivots (their original versions [Mus97j were modified to 
become difficult when the middle element comes from position k instead of k + 1). 

6.3 Computational results 

We varied the input size n from 50,000 to 16,000,000. For the random, mod-m and twofaced 
sequences, for each input size, 20 instances were randomly generated; for the deterministic 
sequences, 20 runs were made to measure the solution time. 

Table 16.11 summarizes the performance of four schemes used in Quickselect with 
median-of-3. The average, maximum and minimum solution times are in milliseconds (in 
general, they grow linearly with n, and can't be measured accurately for small inputs; 
hence only large inputs are included, with 1M := 10 6 ). The comparison counts are in 
multiples of n; e.g., column seven gives C avg /n, where C avg is the average number of 
comparisons made over all instances. Further, P avg is the average number of partitions in 
units of In n, S avg and S® vg are the average numbers of all swaps and of vacuous swaps in 
units of n, and the final column gives the average number of swaps per comparison. Note 
that for random inputs with distinct keys, quickselect with median-of-3 takes on average 
2.75n + o(n) comparisons and y Inn + o(n) partitions |Grii99t IKMP97j . and thus about 
0.55n swaps when there are 1/5 swaps per comparison; e.g., for schemes 1X1 IE1 and IU1 

For each scheme (and others not included in Tab. 16.1(1 . the results for the twofaced and 
m3killer inputs were similar to those for the random and organpipe inputs, respectively. 
The sorted and rotated inputs were solved about twice faster than the random inputs. 

Recall that in tuned versions, scheme iBl coincides with 1X1 and scheme IF1 with lEl 

The run times of schemes O and EH were similar to those of schemes [X] and [II respectively; 
in other words, the inclusion of pointer tests in the key comparison loops didn't result in 
significant slowdowns. Also their comparison and swap counts were similar. 

Due to additional tests for equal keys, the ternary schemes were slower than their 
binary counterparts on the inputs with distinct keys. Yet the slowdowns were quite mild 
(e.g., about ten percent for scheme |E| vs. |XJ) and could be considered a fair price for being 
able to identify all keys equal to the selected one. On the inputs with multiple equal keys, 
the numbers of comparisons made by the binary schemes [XJ and O were similar to those 
made on the random inputs, but the numbers of swaps increased up to n. In contrast, the 
ternary schemes lEl and IU1 took significantly fewer comparisons and more swaps. Scheme lEl 
produced the largest numbers of swaps, but was still faster than schemes IU1 and Ul whereas 
scheme [J] was noticeably faster than scheme O due to the elimination of vacuous swaps. 

On the inputs with distinct keys, Lomuto's scheme O was about sixty percent slower 
than scheme El making about half as many swaps as comparisons (cf. §S J3.3. II and On 
the inputs with multiple equal keys, scheme O was really bad: once the current array 
x[l: r] contains only keys equal to the fcth smallest, each partition removes two keys, so 
the running time may be quadratic in the number of equal keys. For instance, on a binary 
input with k = n/2, at least n(n + 20)/16 — 2 comparisons are used (if the first v = 1, we 
get I = 1, r = k, and then I increases by 2 while r = k; otherwise the cost is greater). 

Our results were similar while using the classic random pivot instead of the median-of-3. 
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Table 6.1: Performance of schemes 1X1 lEl IU1 HI with median-of-3. 



Scheme 


Sequence 


Size 


Time [msec] 


Comparisons 


; [„] 


p 

1 avg 




c-0 

avg 


5avg 






n 


avg 


max 


min 


avg 


max 


min 


[Inn] 


[n] 


N 


[Cavg] 


El 


random 


8M 


252 


360 


170 


2.59 


3.96 


1.78 


1.64 


0.55 


0.00 


0.21 






16M 


494 


641 


371 


2.57 


3.46 


1.93 


1.57 


0.53 


0.00 


0.21 




organpipc 


8M 


173 


250 


111 


2.64 


4.10 


1.77 


1.53 


0.57 


0.00 


0.22 






16M 


355 


460 


270 


2.61 


3.49 


1.94 


1.62 


0.60 


0.00 


0.23 




binary 


8M 


254 


271 


250 


2.73 


2.92 


2.68 


1.86 


1.00 


0.00 


0.37 






16M 


506 


521 


500 


2.70 


2.79 


2.68 


1.87 


1.00 


0.00 


0.37 




ternary 


8M 


246 


321 


171 


2.44 


3.27 


1.75 


1.33 


0.82 


0.00 


0.34 






16M 


452 


620 


360 


2.22 


3.11 


1.75 


1.29 


0.76 


0.00 


0.34 




quadrary 


8M 


277 


340 


230 


2.78 


3.44 


2.26 


1.83 


0.86 


0.00 


0.31 






16M 


537 


671 


460 


2.65 


3.37 


2.26 


1.85 


0.84 


0.00 


0.32 




quintary 


8M 


231 


350 


180 


2.31 


3.56 


1.85 


1.34 


0.69 


0.00 


0.30 






16M 


486 


671 


330 


2.44 


3.49 


1.67 


1.36 


0.71 


0.00 


0.29 


E 


random 


8M 


284 


391 


201 


2.59 


3.96 


1.78 


1.64 


0.55 


0.00 


0.21 






16M 


550 


711 


411 


2.57 


3.46 


1.93 


1.57 


0.53 


0.00 


0.21 




organpipe 


8M 


232 


321 


120 


2.73 


5.27 


1.84 


1.54 


0.57 


0.00 


0.21 






16M 


421 


571 


320 


2.92 


4.62 


1.90 


1.58 


0.59 


0.00 


0.20 




binary 


8M 


205 


231 


170 


1.28 


1.50 


1.00 


0.10 


1.41 


0.61 


1.11 






16M 


381 


471 


350 


1.13 


1.50 


1.00 


0.08 


1.19 


0.46 


1.06 




ternary 


8M 


259 


281 


240 


1.47 


2.00 


1.00 


0.12 


1.37 


0.37 


0.93 






16M 


505 


590 


480 


1.37 


2.00 


1.00 


0.10 


1.25 


0.28 


0.91 




quadrary 


8M 


262 


331 


210 


1.60 


2.50 


1.00 


0.12 


1.33 


0.28 


0.83 






16M 


559 


661 


410 


1.66 


2.25 


1.00 


0.13 


1.35 


0.31 


0.81 




quintary 


8M 


283 


370 


210 


1.52 


2.40 


1.00 


0.13 


1.14 


0.14 


0.75 






16M 


582 


731 


420 


1.55 


2.40 


1.00 


0.14 


1.13 


0.14 


0.73 


ed 


random 


8M 


301 


411 


210 


2.59 


3.96 


1.78 


1.64 


0.55 


0.00 


0.21 






16M 


587 


761 


430 


2.57 


3.46 


1.93 


1.57 


0.53 


0.00 


0.21 




organpipe 


8M 


186 


250 


110 


2.88 


4.20 


1.91 


1.55 


0.61 


0.00 


0.21 






16M 


378 


511 


270 


2.77 


3.93 


1.97 


1.59 


0.59 


0.00 


0.21 




binary 


8M 


293 


331 


250 


1.27 


1.50 


1.00 


0.10 


1.27 


0.27 


1.00 






16M 


549 


671 


500 


1.12 


1.50 


1.00 


0.08 


1.12 


0.13 


1.00 




ternary 


8M 


340 


420 


250 


1.47 


2.00 


1.00 


0.12 


1.21 


0.10 


0.82 






16M 


646 


811 


501 


1.53 


2.00 


1.00 


0.11 


1.26 


0.10 


0.82 




quadrary 


8M 


311 


450 


220 


1.42 


2.25 


1.00 


0.12 


1.02 


0.07 


0.72 






16M 


665 


972 


440 


1.55 


2.50 


1.00 


0.13 


1.13 


0.09 


0.73 




quintary 


8M 


319 


451 


220 


1.47 


2.00 


1.00 


0.13 


0.96 


0.07 


0.65 






16M 


644 


1021 


440 


1.61 


2.80 


1.00 


0.13 


0.97 


0.04 


0.60 


I 


random 


8M 


275 


381 


190 


2.59 


3.96 


1.78 


1.64 


0.55 


0.00 


0.21 






16M 


536 


681 


391 


2.57 


3.46 


1.93 


1.57 


0.53 


0.00 


0.21 




organpipe 


8M 


183 


240 


110 


2.88 


4.20 


1.91 


1.55 


0.61 


0.00 


0.21 






16M 


357 


461 


260 


2.77 


3.93 


1.97 


1.59 


0.59 


0.00 


0.21 




binary 


8M 


245 


261 


230 


1.27 


1.50 


1.00 


0.10 


1.00 


0.00 


0.78 






16M 


500 


530 


480 


1.12 


1.50 


1.00 


0.08 


1.00 


0.00 


0.89 




ternary 


8M 


323 


391 


230 


1.47 


2.00 


1.00 


0.12 


1.11 


0.00 


0.76 






16M 


620 


761 


470 


1.53 


2.00 


1.00 


0.11 


1.16 


0.00 


0.76 




quadrary 


8M 


292 


440 


200 


1.43 


2.25 


1.00 


0.12 


0.95 


0.00 


0.66 






16M 


630 


922 


420 


1.55 


2.50 


1.00 


0.13 


1.04 


0.00 


0.67 




quintary 


8M 


297 


431 


200 


1.47 


2.00 


1.00 


0.13 


0.89 


0.00 


0.60 






16M 


614 


1042 


411 


1.61 


2.80 


1.00 


0.13 


0.93 


0.00 


0.58 
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Then, for random inputs with distinct keys, quickselect takes on average 2(l + ln2)rz + o(n) 
comparisons [Knu98, Ex. 5.2.2-32], and thus about 0.564n swaps when there are 1/6 swaps 
per comparison. Hence, not suprisingly, the running times and comparison counts on the 
inputs with distinct keys increased by between 14 and 20 percent, but all the schemes had 
essentially the same relative merits and drawbacks as in the median-of-3 case above. 
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